Speech Translation with Grammar Driven Probabilistic Phrasal Bilexica Extraction
نویسندگان
چکیده
We introduce a new type of transduction grammar that allows for learning of probabilistic phrasal bilexica, leading to a significant improvement in spoken language translation accuracy. The current state-of-the-art in statistical machine translation relies on a complicated and crude pipeline to learn probabilistic phrasal bilexica—the very core of any speech translation system. In this paper, we present a more principled approach to learning probabilistic phrasal bilexica, based on stochastic transduction grammar learning applicable to speech corpora.
منابع مشابه
Principled Induction of Phrasal Bilexica
We aim to replace the long and complicated, pipeline employed to produce probabilistic phrasal bilexica with a theoretically principled, grammar based, approach. To this end, we introduce a learning regime to learn a phrasal grammar equivalent to linear transduction grammars. The stochastic version of this new grammar type also has the property that the set of biterminals constitute a natural p...
متن کاملGrammarless Extraction of Phrasal Translation Examples from Parallel Texts
We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modelin...
متن کاملApproach to Automatic Translation Template Acquisition Based on Unannotated Bilingual Grammar Induction
In this paper, we propose a new approach which can automatically acquire translation templates from the unannotated bilingual spoken language corpora in the domain of travel information accessing. In the approach, two basic algorithms named grammar induction algorithm and dynamic programming algorithm are adopted. Our approach is an unsupervised, statistical, data-driven method which avoids the...
متن کاملProbabilistic dialogue act extraction for concept based multilingual translation systems
This paper describes a probabilistic method for dialogue act (DA) extraction for concept-based multilingual translation systems. A DA is a unit of a semantic interlingua and it consists of speaker information, speech act, concept and argument. Probabilistic models for the extraction of speech acts or concepts are trained as speech act or concept dependent word n-gram models. The proposed method...
متن کاملExpressive Hierarchical Rule Extraction for Left-to-Right Translation
Left-to-right (LR) decoding Watanabe et al. (2006) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero) that visits input spans in arbitrary order producing the output translation in left to right order. This leads to far fewer language model calls. But the constrained SCFG grammar used in LR-Hiero (GNF) with at most two non-terminals is unable to account for some...
متن کامل